Biostatistics For Dummies (Monika Wahi John Pezzullo)

Learners sometimes think that as long as they sort a spreadsheet of data by a column containing

any value and then select a sample of rows from the top, that they have automatically obtained an

SRS. This is not correct! If you think about it more carefully, you will realize why. If you sort

names alphabetically, you will see patterns in names (such as religious names, or names

associated with certain languages, countries, or ethnicities). If you sort by another identifying

column, such as email address or city of residence, you will again see patterns in the data. If you

attempt to take an SRS from such data, it will be biased, not random, and not be representative.

That is why it is important to use a column with an RNG in it for sorting if you are taking an SRS

electronically.

Taking an SRS intuitively seems like the optimal way to draw a representative sample.

However, there are caveats. In the previous example, you started with a clinical population in the

form of a printed or electronic list of patients from which you could draw a sample. But what if

you want to sample from patients presenting to the emergency department during a particular

period of time in the future? Such a list does not exist. In a situation like that, you could use

systematic sampling, which is explained later in the section “Engaging in systematic sampling.”

Another caveat of SRS is that it can miss important subgroups. Imagine that in your list of clinic

patients, only 10 percent were pediatric patients (defined as patients under the age of 18 years).

Because 10 percent of 20 is two, you may expect that a random sample of 20 patients from a

population where 10 percent are pediatric would include two pediatric patients. But in practice, in a

situation like this, it would not be unusual for an SRS of 20 patients to include zero pediatric patients.

If your SRS needs to ensure representation by certain subgroups, then you should consider using

stratified sampling instead.

Taking a stratified sample

In the previous section, we discussed a scenario where 10 percent of the patients of a clinic are

pediatric patients, and taking a sample of 20 using an SRS from a list of the clinic population runs the

risk of not including any pediatric patients. If pediatric patients were important to the study, then this

problem can be solved with stratified sampling. The word stratum refers to a layer (as you see in a

layer cake), and the word strata is the plural of stratum. Stratified sampling can be seen as sampling

from strata, or layers.

In our scenario, if you choose to draw a stratified sample by age groups, you would first have to

separate the list into a pediatric list and a list of everyone else. Then, you could take an SRS from

each. Because you are concerned about each stratum, you could make a rule that even though pediatric

patients make up only 10 percent of the background population, you want them to make up 50 percent

of your sample. If you did that, then when you took your SRS, you would oversample from the

pediatric list and select 10, while also taking an SRS of 10 from the list of everyone else.